Selection of Passages for Information Reduction

نویسنده

  • Jody J. Daniels
چکیده

There currently exists a bottleneck in extracting information from pre-existing texts to generate a symbolic representation of the text that can be used by a case-based reasoning (CBR) system. Symbolic case representations are used in legal and medical domains among others. Finding similar cases in the legal domain is crucial because of the importance precedents play when arguing a case. Further, by examining the features and decisions of previous cases, an advocate or judge can decide how to handle a current problem. In the medical domain, remembering or finding cases similar to the current patient’s may be key to making a correct diagnosis: they may provide insight as to how an illness should be treated or which treatments may prove to be the most effective. This thesis demonstrates methods of locating, automatically and quickly, those textual passages that relate to predefined important features contained in previously unseen texts. The important features are those defined for use by a CBR system as slots and fillers and constitute the fratnebased representation of a text or case. Broadly, we use a set of textual “annotations” associated with each slot to generate an information retrieval (IR) query. Each query is aimed at locating the set of passages most likely to contain information about the slot under consideration. Currently, a user must read through many pages of text in order to find fillers for all the slots in a case-frame. This is a huge manual undertaking, particularly when there are fifty or more texts. Unfortunately, full-text understanding is not yet feasible as an alternative and information extract techniques themselves rely on large numbers of training texts with manually encoded answer keys. By locating and presenting relevant passages to the user, we will have significantly reduced the time and effort expenditure. Alternatively, we could save an automated information extraction system from processing an entire text by focusing the system on those portions of the text most IikeIy to contain the desired information. This work integrates a case-based reasoner with an IR engine to reduce the information bottleneck. SPIRE [Se-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Contextual Clue Selection on Inference

Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...

متن کامل

Detecting Hidden Passages in Documents

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organization. We present our methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to classify such passag...

متن کامل

Optimal Wavelength Selection in Ultraviolet Spectroscopy for the Estimation of Toxin Reduction Ratio during Hemodialysis

Introduction The concentration of substances, including urea, creatinine, and uric acid, can be used as an index to measure toxic uremic solutes in the blood during dialysis and interdialytic intervals. The on-line monitoring of toxin concentration allows for the clearance measurement of some low-molecular-weight solutes at any time during hemodialysis.The aim of this study was to determine the...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

IR-n, a Passage Retrieval System from University of Alicante, at Clef 2001

Previous works showed that the use of document passages like basic unit of information, to calculate the relevance of a document to a question, improve the results of the information retrieval systems sensibly. However, IR community has not arrived to a consent about how to define those text passages so that the system can improve the efficiently. This paper reports on experiments with IR-n sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996